Search | VHL Regional Portal

1.

Exploring the performance of automatic speaker recognition using twin speech and deep learning-based artificial neural networks.

Cavalcanti, Julio Cesar; da Silva, Ronaldo Rodrigues; Eriksson, Anders; Barbosa, Plinio A.

Front Artif Intell ; 7: 1287877, 2024.

Article in English | MEDLINE | ID: mdl-38405218

ABSTRACT

This study assessed the influence of speaker similarity and sample length on the performance of an automatic speaker recognition (ASR) system utilizing the SpeechBrain toolkit. The dataset comprised recordings from 20 male identical twin speakers engaged in spontaneous dialogues and interviews. Performance evaluations involved comparing identical twins, all speakers in the dataset (including twin pairs), and all speakers excluding twin pairs. Speech samples, ranging from 5 to 30 s, underwent assessment based on equal error rates (EER) and Log cost-likelihood ratios (Cllr). Results highlight the substantial challenge posed by identical twins to the ASR system, leading to a decrease in overall speaker recognition accuracy. Furthermore, analyses based on longer speech samples outperformed those using shorter samples. As sample size increased, standard deviation values for both intra and inter-speaker similarity scores decreased, indicating reduced variability in estimating speaker similarity/dissimilarity levels in longer speech stretches compared to shorter ones. The study also uncovered varying degrees of likeness among identical twins, with certain pairs presenting a greater challenge for ASR systems. These outcomes align with prior research and are discussed within the context of relevant literature.

2.

On the speaker discriminatory power asymmetry regarding acoustic-phonetic parameters and the impact of speaking style.

Cavalcanti, Julio Cesar; Eriksson, Anders; Barbosa, Plinio A.

Front Psychol ; 14: 1101187, 2023.

Article in English | MEDLINE | ID: mdl-37138997

ABSTRACT

This study aimed to assess what we refer to as the speaker discriminatory power asymmetry and its forensic implications in comparisons performed in different speaking styles: spontaneous dialogues vs. interviews. We also addressed the impact of data sampling on the speaker's discriminatory performance concerning different acoustic-phonetic estimates. The participants were 20 male speakers, Brazilian Portuguese speakers from the same dialectal area. The speech material consisted of spontaneous telephone conversations between familiar individuals, and interviews conducted between each individual participant and the researcher. Nine acoustic-phonetic parameters were chosen for the comparisons, spanning from temporal and melodic to spectral acoustic-phonetic estimates. Ultimately, an analysis based on the combination of different parameters was also conducted. Two speaker discriminatory metrics were examined: Cost Log-likelihood-ratio (Cllr) and Equal Error Rate (EER) values. A general speaker discriminatory trend was suggested when assessing the parameters individually. Parameters pertaining to the temporal acoustic-phonetic class depicted the weakest performance in terms of speaker contrasting power as evidenced by the relatively higher Cllr and EER values. Moreover, from the set of acoustic parameters assessed, spectral parameters, mainly high formant frequencies, i.e., F3 and F4, were the best performing in terms of speaker discrimination, depicting the lowest EER and Cllr scores. The results appear to suggest a speaker discriminatory power asymmetry concerning parameters from different acoustic-phonetic classes, in which temporal parameters tended to present a lower discriminatory power. The speaking style mismatch also seemed to considerably impact the speaker comparison task, by undermining the overall discriminatory performance. A statistical model based on the combination of different acoustic-phonetic estimates was found to perform best in this case. Finally, data sampling has proven to be of crucial relevance for the reliability of discriminatory power assessment.

3.

Microphone and Audio Compression Effects on Acoustic Voice Analysis: A Pilot Study.

Cavalcanti, Julio Cesar; Englert, Marina; Oliveira, Miguel; Constantini, Ana Carolina.

J Voice ; 37(2): 162-172, 2023 Mar.

Article in English | MEDLINE | ID: mdl-33451892

ABSTRACT

OBJECTIVE: This study aimed to analyze the effects of microphone and audio compression variables on voice and speech parameters acquisition. METHOD: Acoustic measures were recorded and compared using a high-quality reference microphone and three testing microphones. The tested microphones displayed differences in specifications and acoustic properties. Furthermore, the impact of the audio compression was assessed by resampling the original uncompressed audio files into the MPEG-1/2 Audio Layer 3 (mp3) format at three different compression rates (128 kbps, 64 kbps, 32 kbps). Eight speakers were recruited in each recording session and asked to produce four sustained vowels: two [a] segments and two [É] segments. The audio was captured simultaneously by the reference and tested microphones. The recordings were synchronized and analyzed using the Praat software. RESULTS: From a set of eight acoustic parameters assessed (f0, F1, F2, jitter%, shimmer%, HNR, H1-H2, and CPP), three (f0, F2, and jitter%) were suggested as resistant regarding the microphone and audio compression variables. In contrast, some parameters seemed to be significantly affected by both factors: HNR, H1-H2, and CPP; while shimmer% was found sensitive only concerning the latter factor. Moreover, higher compression rates appeared to yield more frequent acoustic distortions than lower rates. CONCLUSION: Overall, the outcomes suggest that acoustic parameters are influenced by both the microphone selection and the audio compression usage, which may reflect the practical implications of these components on the acoustic analysis reliability.

Subject(s)

Speech Acoustics , Voice , Humans , Pilot Projects , Reproducibility of Results , Acoustics

4.

Multi-parametric analysis of speech timing in inter-talker identical twin pairs and cross-pair comparisons: Some forensic implications.

Cavalcanti, Julio Cesar; Eriksson, Anders; Barbosa, Plinio A.

PLoS One ; 17(1): e0262800, 2022.

Article in English | MEDLINE | ID: mdl-35061853

ABSTRACT

The purpose of this study was to assess the speaker-discriminatory potential of a set of speech timing parameters while probing their suitability for forensic speaker comparison applications. The recordings comprised of spontaneous dialogues between twin pairs through mobile phones while being directly recorded with professional headset microphones. Speaker comparisons were performed with twins speakers engaged in a dialogue (i.e., intra-twin pairs) and among all subjects (i.e., cross-twin pairs). The participants were 20 Brazilian Portuguese speakers, ten male identical twin pairs from the same dialectal area. A set of 11 speech timing parameters was extracted and analyzed, including speech rate, articulation rate, syllable duration (V-V unit), vowel duration, and pause duration. Three system performance estimates were considered for assessing the suitability of the parameters for speaker comparison purposes, namely global Cllr, EER, and AUC values. These were interpreted while also taking into consideration the analysis of effect sizes. Overall, speech rate and articulation rate were found the most reliable parameters, displaying the largest effect sizes for the factor "speaker" and the best system performance outcomes, namely lowest Cllr, EER, and highest AUC values. Conversely, smaller effect sizes were found for the other parameters, which is compatible with a lower explanatory potential of the speaker identity on the duration of such units and a possibly higher linguistic control regarding their temporal variation. In addition, there was a tendency for speech timing estimates based on larger temporal intervals to present larger effect sizes and better speaker-discriminatory performance. Finally, identical twin pairs were found remarkably similar in their speech temporal patterns at the macro and micro levels while engaging in a dialogue, resulting in poor system discriminatory performance. Possible underlying factors for such a striking convergence in identical twins' speech timing patterns are presented and discussed.

Subject(s)

Speech , Twins, Monozygotic/psychology , Adult , Forensic Psychology , Humans , Male , Phonetics , Speech Perception , Tape Recording , Time Factors , Young Adult

5.

Multiparametric Analysis of Speaking Fundamental Frequency in Genetically Related Speakers Using Different Speech Materials: Some Forensic Implications.

Cavalcanti, Julio Cesar; Eriksson, Anders; Barbosa, Plinio A.

J Voice ; 2021 Oct 07.

Article in English | MEDLINE | ID: mdl-34629229

ABSTRACT

OBJECTIVE: To assess the speaker-discriminatory potential of a set of fundamental frequency estimates in intraidentical twin pair comparisons and cross-pair comparisons (i.e., among all speakers). PARTICIPANTS: A total of 20 Brazilian Portuguese speakers of the same dialect, namely 10 male identical twin pairs aged between 19 and 35, were recruited. METHOD: the participants were recorded directly through professional microphones while taking part in a spontaneous dialogue over mobile phones. Acoustic measurements were performed in connected speech samples, and in lengthened vowels, at least 160 ms long produced during spontaneous speech. RESULTS: f0 baseline, central tendency, and extreme values were found mostly discriminatory in intra-twin pair and cross-pair comparisons. These were also the estimates displaying the largest effect sizes. Overall, only three identical twins were found statistically different regarding their f0 patterns in connected speech, but not for lengthened vowel-based f0 metrics. Estimates of f0 variation and modulation were found the least discriminatory across speakers, which may signal the control of speaking style and dialect on dynamic patterns of f0. Concerning system performance, the base value of f0 (f0 baseline) was found the most reliable metric, displaying the lowest equal error rate (EER). CONCLUSIONS: the outcomes suggest that, although identical twins were very closely related regarding their f0 patterns, some pairs could still be differentiated acoustically, only in connected speech. Such findings reinforce the relevance of analyzing long-term f0 metrics for speaker comparison purposes, with particular consideration to f0 baseline. Furthermore, f0 differences across subjects were suggested as more expressive in connected speech than in lengthened vowels.

6.

Acoustic analysis of vowel formant frequencies in genetically-related and non-genetically related speakers with implications for forensic speaker comparison.

Cavalcanti, Julio Cesar; Eriksson, Anders; Barbosa, Plinio A.

PLoS One ; 16(2): e0246645, 2021.

Article in English | MEDLINE | ID: mdl-33600430

ABSTRACT

The purpose of this study was to explore the speaker-discriminatory potential of vowel formant mean frequencies in comparisons of identical twin pairs and non-genetically related speakers. The influences of lexical stress and the vowels' acoustic distances on the discriminatory patterns of formant frequencies were also assessed. Acoustic extraction and analysis of the first four speech formants F1-F4 were carried out using spontaneous speech materials. The recordings comprise telephone conversations between identical twin pairs while being directly recorded through high-quality microphones. The subjects were 20 male adult speakers of Brazilian Portuguese (BP), aged between 19 and 35. As for comparisons, stressed and unstressed oral vowels of BP were segmented and transcribed manually in the Praat software. F1-F4 formant estimates were automatically extracted from the middle points of each labeled vowel. Formant values were represented in both Hertz and Bark. Comparisons within identical twin pairs using the Bark scale were performed to verify whether the measured differences would be potentially significant when following a psychoacoustic criterion. The results revealed consistent patterns regarding the comparison of low-frequency and high-frequency formants in twin pairs and non-genetically related speakers, with high-frequency formants displaying a greater speaker-discriminatory power compared to low-frequency formants. Among all formants, F4 seemed to display the highest discriminatory potential within identical twin pairs, followed by F3. As for non-genetically related speakers, both F3 and F4 displayed a similar high discriminatory potential. Regarding vowel quality, the central vowel /a/ was found to be the most speaker-discriminatory segment, followed by front vowels. Moreover, stressed vowels displayed a higher inter-speaker discrimination than unstressed vowels in both groups; however, the combination of stressed and unstressed vowels was found even more explanatory in terms of the observed differences. Although identical twins displayed a higher phonetic similarity, they were not found phonetically identical.

Subject(s)

Speech Acoustics , Speech/physiology , Verbal Behavior/physiology , Acoustics , Adult , Brazil , Forensic Sciences/methods , Humans , Language , Male , Phonetics , Psychoacoustics , Speech Perception/physiology , Twins, Monozygotic

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL